Learning Sequential Patterns for Lipreading
نویسندگان
چکیده
This paper presents a machine learning approach to Lip Reading and proposes a novel learning technique called sequential pattern boosting that allows us to efficiently search and combine temporal patterns to form strong spatio-temporal classifiers. Attempts at automatic lip reading need to address the demanding challenge that the problem is inherently temporal in nature. It is crucial to model and use spatio-temporal information. To achieve this, we use sequential patterns, an ordered sequence of feature subsets. Sequential patterns then form weak classifiers that are combined together into a strong spatio-temporal classifier by means of boosting. A boosted classifier consists of a linear combination of a number (S) of selected weak classifiers, and take the form of: H(I) = ∑i=1 αihi(I). The weak classifiers hi are selected iteratively based on weights formed during training. In order to determine the optimal weak classifier at each Boosting iteration, the common approach is to exhaustively search the entire set of candidate weak classifiers. However, when dealing with sequential patterns, the number of weak classifiers becomes too large. Specifically, given D features, sequential patterns up to length N and maximum of K items, the number of weak classifiers is the binomial coefficient polynomial: (D K )N . In the experiments performed here (D = 900,K = 3,N = 7), the total number of weak classifiers is 5x1058. To address this, we propose a novel Boosting algorithm called Sequential Pattern Boosting (SPBoost). Firstly, we define sequential patterns as follows:
منابع مشابه
Temporal Pattern Classification using Spiking Neural Networks
A novel supervised learning-rule is derived for Spiking Neural Networks (SNNs) using the gradient descent method, which can be applied on networks with a multi-layered architecture. All existing learning-rules for SNNs limit the spiking neurons to fire only once. Our algorithm however is specially designed to cope with neurons that fire multiple spikes, taking full advantage of the capabilities...
متن کاملUsing Surface-Learning to improve Speech Recognition with Lipreading
We explore multimodal recognition by combining visual lipreading with acoustic speech recognition. We show that combining the visual and acoustic clues of speech improves the recog nition performance significantly especially in noisy environment. We achieve this with a hybrid speech recognition architecture, consisting of a new visual learning and tracking mechanism, a channel robust acoustic ...
متن کاملSecond-order Methods in Boltzmann Learning: an Application to Speechreading
We introduce second-order methods for training and pruning of general Boltzmann networks trained with cross-entropy error. In particular, we derive the second derivatives for the entropic cost function. We illustrate pruning on Boltzmann zippers, applied to real-world data | a speechreading (lipreading) problem.
متن کاملDoes Fundraising Have Meaningful Sequential Patterns? The Case of Fintech Startups
Nowadays, fundraising is one of the most important issues for both Fintech investors and startups. The pattern of fundraising in terms of “number and type of rounds and stages needed” are important. The diverse features and factors that could stem from Fintech business models which can influence success are of the key issues in shaping these patterns. This study applied the top 100 KPMG Fintech...
متن کاملSurface Learning with Applications to Lipreading Surface Learning with Applications to Lipreading
Most connectionist research has focused on learning mappings from one space to another (eg. classiication and regression). This paper introduces the more general task of learning constraint surfaces. It describes a simple but powerful architecture for learning and manipulating nonlinear surfaces from data. We demonstrate the technique on low dimensional synthetic surfaces and compare it to near...
متن کامل